Community Detection

Data Source: https://github.com/benedekrozemberczki/datasets#twitch-social-networks

Import packages, data and build the graph

Plot original Network before Community Detection

Community Detection Algorithms for Large Network

Size of the Network:

Computing Resources:

Analyze accuracy and efficiency of different algos on the large Twitch User Network

The transitivity or clustering coefficient of a network is a measure of the tendency of the nodes to cluster together. High transitivity means that the network contains communities or groups of nodes that are densely connected internally.

Modularity-based communities

The Clauset-Newman-Moore greedy modularity maximization has O(N log^2 N) complexity and found 101 communities in 41.8 s. The partitions have modularity 0.41 which is the second highest amoung the tested algos.

The naive_greedy_modularity_communities has O(N^4) computational complexity which is much slower than the alternative and it failed to detect communities in 3 hours.

naive_greedy_modularity_communities

Louvain Community Detection

Louvain is an efficient algo with O(NlogN) computational complexity.

The algo indentified 21 communities in 1.67 s given our computing resources.

The partitions have modularity 0.45 which is the highest amoung the algos we used.

Girvan-Newman algorithm

Partitions via centrality measures

The Algo is one of the slowest with high computational complexity O(N^3).

Given our computing resources, the algo failed to find communities in 3 hours.

Label Propagation

label propagation has significant advantages in its running time. It is the most efficient amoung the algos we tested.

However, the alog doesn't produce unique solution. It tends to classify the network (total 7126 nodes) into one big community (e.g. 6352 nodes) and many small communities.

Fluid Communities

Asynchronous Fluid Communities algorithm for community detection.

FluidC together with Label Propagation (LP) are the fastest and most scalable algorithms with a linear computational complexity of O(N). FluidC identified 10 communities in 759 ms. However, FluidC has better performance with 0.398 modularity than LP and it doesn't have the tendency to classfy the Network into one big community and many small ones.

FluidC avoids the creation of monster communities in a non-parametric manner. Due to the spread of density among the vertices that compose a community, a large community (when compared to the rest of communities in the graph) will only be able to keep its size and expand by having a favourable topology (i.e., having lots of intra-community edges which make up for its lower density).

FluidC Paper: https://arxiv.org/pdf/1703.09307.pdf

Measuring partitions

Other Potentials Topics to discuss

Compare Leiden's improvements over Louvain

Walktrap: random walks in the network

Reference:

https://graphsandnetworks.com/community-detection-using-networkx/

https://networkx.org/documentation/stable/reference/algorithms/community.html

https://arxiv.org/pdf/1703.09307.pdf

https://link.springer.com/article/10.1007/s00607-021-01027-4

https://www.researchgate.net/publication/351058899_LPA-MNI_An_Improved_Label_Propagation_Algorithm_Based_on_Modularity_and_Node_Importance_for_Community_Detection

Leiden/infomap https://theses.liacs.nl/pdf/2019-2020-VerweijGeerten.pdf